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Under left truncation, data [Xi,Yi) are observed only when Yi <Xi. Usually, the distribution 
function F of the Xi is the target of interest. In this paper, we study linear functionals J ipAFn 
of the nonparametric maximum likelihood estimator (MLE) of F, the Lynden-Bell estimator Fn. 
A useful representation of J LpdF^ is derived which yields asymptotic normality under optimal 
moment conditions on the score function tp. No continuity assumption on F is required. As a 
by-product, we obtain the distributional convergence of the Lynden-Bell empirical process on 
the whole real line. 
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1. Introduction and main results 

In this paper, we provide some further methodology for statistical analysis of data which 
are truncated from the left. To be more specific, let [Xi, Yi), 1 <i < N, denote a sample 
of independent bivariate data such that, for each i, X, is also independent of Yi. Denote 
by F and G, respectively, the unknown distribution functions of X and Y. Typically, F 
is the target of interest. Now, under left truncation, Xi is observed only when Yi < Xi. 
As a consequence, the empirical distribution of the AT's is unavailable and cannot serve 
as a basic process to compute other statistics. 

The nonparametric maximum likelihood estimator of F for left-truncated data was first 
derived by Lynden-Bell (1971). Its first mathematical investigation may be attributed to 
Woodroofe (1985), who also reviewed some examples of truncated data from astronomy 
and economics; see also Wang (1989) for applications in the analysis of AIDS data. 

Now, denoting by n the number of data which are actually observed, we have, by the 
strong law of large numbers (SLLN) , 

71 

— — * a = P(y < X) as iV ^ cxD with probability one. 
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Without further mention, we shall assume that a > because, otherwise, nothing will be 
observed. Of course, a will be unknown. Conditionally on n, the observed data arc still 
independent, but the joint distribution of Xi and Yi becomes 

H*{x,y)^F{X <x,Y <y\Y <X)^a~^ ( G{y A z)F{dz). 

J ( — oo,a:] 

The marginal distribution of the observed X's thus equals 



F*{x) = a-' / Giz)F{dz). (1.1) 

J ( — oo,a:] 

It may be consistently estimated by the empirical distribution function of the known 
X's: 



1 " 



n 

i=l 

The problem, however, is one of reconstructing F and not F* from the available data 
{Xi,Yi), 1 <i<n. A crucial quantity in this context is the function 

C{z) = P(y < z < X\Y <X)^ a~^G{z)[l - (1.2) 

where 

F(z-) = limF(x) 

xi z 

is the left-continuous version of F and F{z} ~ F{z) — F{z—) is the i^-mass at z. The 
function C may be consistently estimated by 



Cn{z) = n ^ l{yi<z<x,}- 



1=1 

It is very helpful to express the cumulative hazard function of F ^ 

J{-oo,x] 1 - F[Z-) 

in terms of estimable quantities. For this, let 

ac = inf {a; : G(a;) > 0} 

be the largest lower bound for the support of G. Similarly for F. From (1.1) and (1.2), 
we obtain 

F(dz) _ f F*(dz) ^^^^^ 



{acM 1 --^(^-) J{<^G,x] C{z) 
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Provided that ac < clf and F{aF} = 0, the left-hand side equals A(a;). Otherwise, this is 
no longer true and, as Woodroofe (1985) pointed out, F cannot be fully recovered from 
the available data. The situation is similar for right-censored data; see Stute and Wang 
(1993) for a detailed discussion for (upper) boundary effects there. 
Throughout the paper, we shall therefore assume that 



og 5: ct_F and F{ai?} = 0. 



(1.4) 



If flc < fli?, the second assumption is superfluous, while it is automatically satisfied when 
F is continuous. For the moment, however, no other assumptions such as continuity of 
F or G will be needed. 

In differential terms, equation (1.3) leads to 



F{dx) {1-F{x-)) 



F*{dx) 



(1.5) 



The Lynden-Bell (1971) estimator F„ of F is obtained as the solution of the so-called 
self-consistency equation, that is, as the solution of (1.5) after having replaced F* and 
C with F* and C„, respectively: 



F„(dx) = (l-F„(.T-)) 



F*(dx) 



Cn{x) ■ 

Solving for F„ yields the product integral representation of F, 



l-F^{x) 



n 

distinct Xi<.x 



1 



If there are no ties among the X's, (1.7) simplifies to become 



1-F„(x) 



n 

Xi<x 



(1.6) 



(1.7) 



(1.8) 



Note that riC„(Xj) > 1, so each ratio is well defined. Since, in this paper, our objective will 
be to study general linear statistics based on F„ , namely Lynden-Bell integrals / ip dF„ , 
we also introduce the Lynden-Bell weights Win attached to each datum in the X-sample. 
For this, denote by Xi.„ < • • • < n the m distinct order statistics, with m possibly 
strictly less than n. 
From (1.7), we obtain 



■ Fn{Xi;n} — [1 — Fn{Xi^i;n)] 



F*{^i:n} 
Cn {Xi;n) 



and 



/m 
^dFn=Y,W„MX^■.n). 



(1.9) 



(1.10) 
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For tp = l{-oo,x]: we are back at F„(x). As for other (/j's, we refer to Stute and Wang 
(1993) or Stute (2004), who considered possible apphcations of empirical integrals in the 
context of right-censored data. More general statistical functionals often admit expan- 
sions, in which the leading term is of the form J (pdFn, with (p denoting the associated 
influence function. Since, for a fully observable data set with being the classical em- 
pirical distribution function, J (fidF^ is just a sample mean to which, under a second 
moment assumption, the central limit theorem (CLT) applies, distributional convergence 
oi J (fi dFn therefore constitutes an extension of the CLT to the left-truncation case. The 
corresponding SLLN has been studied in papers by He and Yang (1998a, 1998b). The 
CLT for censored data is due to Stute (1995). 

In the present situation, the CLT is much more elusive than for randomly censored 
data. In the censored data case, the right tails create technical difhculties, but not the 
left. For left truncation, however, both sides create problems. This is already seen via 
the functions C and C„ which decrease to zero on the left and on the right tail, and 
with Cn appearing in denominators. The only trivial bound is nC„(X,;) > 1, which keeps 
everything from being "not well defined" . Moreover, C and C„ are not monotone, again 
contrary to the censored data case, where the role of the C's is played by survival func- 
tions. This non-monotone feature of C„ creates additional technical complications for 
truncated data. Worse than that, C„ may also become zero between the data points in 
the central part. Keiding and Gill (1990), who recognized the danger of these sets, have 
called these holes the "empty inner risk sets" . Not all authors seem to know about these 
problems since a detailed study of the C„ process is sometimes missing. A consequence of 
these empty inner risk sets, as revealed by the representation (1.7), is the loss of mass of 
Fn after the first such hole. More specifically, suppose that, in terms of Keiding and Gill 
(1990), a risky hole exists at Xj. This means that Xj is not covered by any other pair 
{Xi, Yi). Hence, nCn{Xj) = nF*{Xj}. From (1.7), we get that all data points right to Xj 
have mass zero under Fn - In proofs, this disallows the incorporation of exponential repre- 
sentations of the weights. To circumvent this difficulty, we first study an asymptotically 
equivalent estimator Fn. This estimator is defined via 

The extra summand 1 in the denominator allows for contributions from data which 
have holes on their left. As we shall see in a small simulation study, it may have a 
robustifying effect, resulting in smaller mean squared error (MSB) for small sample sizes. 
Asymptotically, / ipdFn and / ipdFn are equivalent at the n^^^-rate, which facilitates the 
asymptotic theory for J LpdFn. Another technical problem is caused by possible ties. In 
such a situation, (1.8) does not apply. Lemma 1.1 will show how the general case covered 
by (1.7) and (1.9) may be traced back to the case of a continuous F* . 

The main result of this paper. Theorem 1.1, provides a representation of J LpdFn as 
a sum of i.i.d. random variables under minimal assumptions on (p and the truncation 
mechanism. Usually, linear i.i.d. representations of complicated estimators will include 
the Hajek projection of the statistic of interest. In the case of (1.10), however, this 



nF;AXr.n} 

nCn{Xj.,n) + l 



(1.11) 
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projection can be computed only up to remainders, and this is exactly what Theorem 
1.1 does. More precisely, the proof of Theorem 1.1 proceeds in two steps. In the first 
step we have to identify all error terms which are negligible. The leading terms will turn 
out to be V-statistics; see Serfling (1980). Finally, an application of the Hajek projection 
to the leading terms yields the desired i.i.d. representation. Needless to say, asymptotic 
normality follows immediately. We shall also add some interesting comments on a so- 
called uniform representation. Proofs will be given in Section 3. 
Theorem 1.1 will hold under the following two assumptions: 

(A) (i) /^VGdF<oo; 
(ii) /f <oo. 

Condition (ii) already appeared in Woodroofc (1985) in his study of the Lynden-Bell 
process, that is, when he considered indicators <p = l(_oo,xo]- Non-tcchnically speaking, 
it is needed to guarantee enough information in the left tails to estimate F at the rate 
Under slightly stronger assumptions, Stute (1993) obtained an almost sure repre- 
sentation with sharp bounds on the remainder, again for indicators; see also Chao and Lo 
(1988). Condition (i) guarantees, among other things, that the leading terms in the i.i.d. 
representation admit a finite second moment so that asymptotic normality holds. Since 
G< 1, it implies j ip^ dF < oo, which is the standard finite moment assumption when 
no truncation occurs. When ip has a finite second i^-momcnt and is locally bounded in 
a neighborhood of ag, then (i) is implied by (ii). Note also that (i) and (ii) are always 
satisfied when aa < clf and / ip^ AF < oo. A CLT for truncated data is also contained in 
Sanchez Sellero et al. (2005). Apart from continuity assumptions, they also need condi- 
tions which, in our notation, require finitcness of the integral 

p>l{x){l-F{x))-''F{dx), 

where \(f \ < ipo. Since, however, this integral equals infinity for constant ipa, their result 
is not applicable to bounded ip^s, not to mention p's which increase to infinity as a; — > oo. 
Rather, finiteness of the above integral is only obtained for ipo^s which converge to zero 
fast enough in the right tails. 

The focus of this paper is, however, on distributional convergence for which (A) will 
suffice. Theorem 1.1 is formulated for the case when F is continuous. This guarantees that 
among the observed X's, there will be no tics, with probability one. In such a situation, 
we obtain 

ipdFn = / „ , . — ^n(da;), (1.12) 



C„(x) 



with 



-fn{x) = expjn / In 



1 



1 



K(dy) . (1.13) 



nCn{y) + l_ 

At the end of this section, we shall show how general Lyndcn-Bcll integrals may be traced 
back to the present case. 



The CLT under random truncation 



609 



Theorem 1.1. Under Assumptions (A) and (l.-i), assume that F is continuous. We 
then have 



J ^[dF„^dF]= J p^[F:{dy)~F*idy)] 

U'{y)F*{dy) + op{n-'r^) 



CHy) 



where 



,p{y) = ^{y){l - Fiy)) - / ^i^l^LJ^i^F* (dx) = f My) - ^(x)]^^(dx). 
J{y<x} J{y<x} 

For indicators ip ~ l(-oo,a;o]j the leading term already appears in Theorem 2 in Stute 
(1993). 

Remark 1.1. If one checks the proof of Theorem 1.1 step by step, the following fact 
will be revealed. If, rather than a single ip, one considers a collection {i^}, then the error 
terms are uniformly small in the sense that the remainder is op(ri,~^/^) mriformly in 

2 

ip, provided that \if \ < ipQ for some ipQ satisfying / ^dF < oo. Actually, all remainders 
may be bounded from above in absolute value by replacing \ip\ by (po. Compared with 
Sanchez Sellero et al. (2005), no VC property for the (^'s is needed for a representation 
as a V-statistic process. For the i.i.d. representation, one has to guarantee that the errors 
in the Hajek projection are also uniformly small. These errors, however, form a class of 
degenerate U-statistics. This kind of process was studied in Stute (1994) and the achieved 
bounds are useful for handling the error terms in the second half of the proof. Details 
are omitted. 



Corollary 1.1. Under the assumptions of Theorem 1.1, we have 
nV^/HdF„-d^]^AA(0,.^) ^n d^strU^ut^on, 

with 

It is not difficult to see that cr^ < oo under (A). 

Finally, if, in Remark 1.1, we take for {Lp} the class of all indicators ip = l(^oo,x] f-nd 
set (/Jo = 1, we obtain the following corollary. 

Corollary 1.2. Under J ^ < oo and (1.4-), and for a continuous F, we have 



F„{x)~Fix)^ I ^[dF:-dF*] 
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uniformly in x, where is the tp belonging to ip ^ 1(-oo,k]- 

Remark 1.2. To make the point clear. Corollary 1.2 provides a representation which 
holds uniformly on the whole real line and not only on subintervals (—00,6] with b< 
bp = sup{a; : F{x) < 1}, as is usually the case in the literature. 

A major technical problem for proving Theorem 1.1 for a general F is caused by the 
fact that for discontinuous F, ties may arise. As before, denote by Ai.„ < • • • < X^-.n 
the m ordered distinct data in the observed X-sample. To circumvent ties, one may use 
the fact that each Xi may be written in the form Xi — F*''^{Ui), where Ui, is a 

sample of independent random variables with a uniform distribution on (0, 1) and F*~^ 
is the quantile function of F* . The construction of the U's is similar to the construction 
in Lemma 2.8 of Stute and Wang (1993), with H there replaced by F* here. 

For the following, recall that a quantile function is continuous from the left. Fur- 
thermore, with probability one, F*~^ is also right-continuous at each Ui. With this in 
mind, one can see that the corresponding truncating sample for the t/^'s consists of 
F*{Yi—),l <i<n. If we denote by the C„-function corresponding to the pseudo- 
observations {Ui, F*{Yi — )), l<i<n, and if we let Uu-.n < ■ ■ ■ < Uidi-.n denote the ordered 
[/'s that satisfy F*^^{Uij:n) =Xi:n, with dt = nF*{Xi;n}, then 

nC^(C/zi:„) = nC„(A,:„) (1.14) 

and 

r^CyU.y.n) = nC^(C/,,,_i:„) - 1 for 2 < j < d,. (1.15) 

Also, note that the Lynden-Bell estimator F^ for the pseudo-observations satisfies (1.8), 
that is, 

Ui<u 



ri' 



cum. 



Introducing the hmction ip* = if o F* , the analog of (1.10) becomes 



/ VP* df^,f=^X]^(F*-l ([/,,:„)) 



In Lemma 1.1, we will show that for each 1 <i <m and every I < j < di, we have 

1 ~ Fn{Ui]-n-) ^ 1 ~ FnjXi^i.n) Cllfil 



The CLT under random truncation 



611 



where Xo-n ~ — oo. It follows from (1.9), (1-10) and (1.16) that 

J tp* dF^ = J ipdFn with probability one. 

In conclusion, the study of Lyndcn-BcU integrals may be traced back to the case when 
the variables of interest are uniform on (0, 1) and, therefore, with probability one, have 
no ties. At the same time, our handling of ties does not require external randomization, 
but maintains the product limit structure of the weights and hence of the estimators. 

Lemma 1.1. For each 1 <i <m and 1 < j < di, equation (1-16) holds. 



2. Simulations 

It is interesting to compare the small sample size behaviors of F^ and F^. In a simulation 
study, we considered ^{x) ~x, that is, the target was the mean lifetime of X. This ip 
is the canonical score function in the classical CLT. Recall that via truncation, there is 
a sampling bias which would result in an upward bias if we were to take the empirical 
distribution function of the X's and not Fn or Fn. Introducing more or less complicated 
weights has the effect, among other things, that compared with the empirical distribution 
function, the bias is reduced. 

In the following, we report on some simulation results which are part of a much more 
extensive study. Typically, for this ip, J ipdFn outperforms J ipdFn in terms of the MSE 
when 10 < n < 40 and truncation is heavy. For larger n {n> 40), the difference is negli- 
gible. MSE was computed via Monte Carlo. See Table 1 for details. 

In the simulations, both X and Y were exponentially distributed with parameter 1. 



Table 1. Comparison of MSE 



Sample size 


MSE(/ <^dF„) 


MSE(/ ¥'dF„) 


n = 10 


0.93 


0.56 


71 = 20 


0.45 


0.39 


71 = 30 


0.38 


0.32 


n = 40 


0.35 


0.31 


71 = 50 


0.29 


0.27 


71 = 60 


0.26 


0.24 


71 = 70 


0.24 


0.23 


71 = 80 


0.22 


0.22 


71 = 90 


0.22 


0.21 


71=100 


0.20 


0.19 
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To prove Theorem 1.1, note that under continuity of F, (1.8) apphes. Also, we may 
assume, without loss of generality, that all data arc non-negative. Hence, all integrals 
appearing hereafter will be over the positive real line. 

In our first lemma, we provide a bound for the fmiction C/C„. This will be needed in 
proofs to handle negligible terms. Although, by Glivcnko-Cantelli, C„ — C — > uniformly, 
bounding the ratio is more delicate in view of possible holes and the non-monotonicity 
of C and Cn ■ 



Lemma 3.1. Assume F is continuous. For any A such that aX > I, one has 



Xi:Xi>b Cn{Xi) 



>A < Ac^cxp[-G'(5)aA] 



Proof. The proof is similar to that of Lemma 1.2 in Stute (1993), which provides an 
exponential bound for the supremum extended over the left tails Xi <b rather than the 
right tails. □ 

Together with the aforementioned bound from Stute (1993), Lemma 3.1 immediately 
implies 

C{X, 



l<i<n '~'n\^i) 



: Op{l) as n oo. 



(3.1) 



Assertion (3.1) will be of some importance in forthcoming proofs as it will allow us to 
replace the random C„ appearing in denominators with the deterministic C. 
We are now ready to expand J ipdFn- By (1.12) and (1.13), 



if{x) 



In 



1- 



1 



nCn{y) + 1 



F:Mv) \F:{dx) 



Cn {x) 



7„(a;)F*(da;). 



Set 



7(x) = 1 — F{x) = cxp 
From Taylor's expansion, we obtain 



F 



1dy)\. 



where 



Bn{x) = n 



C{y) j- 
-in{x) = -i{x) + e^"(")[B„(a;) + Dni{x) + Ai2(x)], 



In 



nC„(y) + i 



Km 



C„(y) + l/n' 
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Dnl{x) 



F*{dy)-F*{dy) 

C{y) 



Jo [Cn{y) + l/n]C{y) 

and A„(x) is between the exponents of 7„(x) and j{x). Particularly, we have A„(a;) < 0. 
Setting 



Snl 



7l3 



0-e^-i:^)B^{x)F:{dx\ 

L^n\X) 



. ip{x)-i{x) 



-^c^"(^)i?„i(x)^^:(d.) 

Cn{x) 



and 

SnA = /"-^e^"(-)7^„2(x)K(dx), 

we thus get 

J (pdFn=Snl+Sn2+Sn3 + Sn4- (3.2) 

In the next lemma, wc study the functions Dni and D„2 more closely. To motivate the 
following, note that for each fixed xq such that F{xo) < 1, 

^ni(2;o) ^ with probability one. 

Actually, by standard Glivenko-Cantelli arguments, 

F>ni{x) — > with probability one uniformly on x <xq. 

Similarly, when we consider the standardized processes x —* v}^'^Dni{x), it is easy to 
show their distributional convergence in the Skorokhod space -D[0,a;o]. Things change, 
however, if we study Dni on the whole support of F* . Since 

we cannot expect uniform convergence on the whole support of F*. The situation is 
similar for the cumulative hazard function A, where uniform convergence of the Nclson- 
Aalen estimator A„ may be obtained only on compact subsets of the support of F. 

When one evaluates these processes at x = Xi, though, it is known that the uniform 
deviation between A„ and A does not got to zero, but remains at least bounded; see 
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Theorem 2.1 in Zhou (1991). Similar things turn out to be true for Dni and £'„2, as 
Lemma 3.2 wih show. Our proofs are different, though, since compared with Zhou (1991), 
we shah apply a truncation technique which in proofs guarantees that the suprema of 
Dni and -D„2 are bounded on large, but not too large, sets. 



Lemma 3.2. We have 



and 



sup \Dra{Xi)\^Or{l) (3.3) 

l<i<Tl 



sup \Dr,2{Xi)\^0Al)- (3.4) 

l<i<Ji 

Proof. Assume, without loss of generality, that F* has unbounded support on the right. 
Otherwise, replace cxd by = sup{x : F*{x) < 1}. For a given e > 0, one may find some 
small c = Ce and a sequence a„ — > cxd such that 1 — F*(a„) = — and P(X„:„ < a„) > 1 — £. 
Actually, this follows from 



nXn:n < «„) ^ 



1-^ 

n 



■ exp(— c). 



It therefore suffices to bound D^i and D„2 on (— cx),a„]. By Lemma 3.1, we may replace 
C'n in the denominator by C. Hence, with large probability and up to a constant factor, 
(3.4) is bounded from above by 

-Kim 







CHy) " nj, C^{y) 



Using (ii) of (A), it is easily seen that the expectation of the second term is bounded as 
n — > oo . The expectation of the first term is less than or equal to 

""E|C„_i(y)-C(;/)| + l/n^,^^^^ 



^^/«7^^"^^- ^-(d.)^o(i). 

This proves (3.4). As for Z^ni, we already mentioned that Dni converges to zero with 
probability one uniformly on each interval [0,a:;o] with F{xf)) < 1. Also, for each fixed a;, 
we have 

ED„i(a^)=0 and VarD„i(a;) < - / ^p^^- 
Moreover, by the construction of a„ and the definitions of F* and C, we have 

Vari^„i(a„) = 0(l). 
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We conclude that Dni{an) = Op(l). Applying standard tightness arguments for the (non- 
standardized) process Dni (see Billingslcy (1968), page 128), we get that Dni is uniformly 
bounded on [0,a„]. This, however, implies (3.3) and completes the proof of Lemma 3.2. □ 

Our next lemma implies that Sni is negligible. 

Lemma 3.3. Under the assumptions of Theorem 1.1, we have 

Snl =op(ri"^/^). 

Proof. We first bound Bn{x). Since, for < a; < i, we have 

— X — X^ < —X ; r < ln(l — x) < —X, 

2(1 — X) 

we obtain 

Recall that nC„ > 1 on the support of i^,*, so the above integrals are all well defined. We 
conclude from (3.5) that 

IS J<n- rM£^e^"(^) r^^F*(dx) 

' Jo C„(x)' J, CUy) 

Now, as in the proof of Lemma 3.2, for a given e > 0, we may choose some small c = Cg 
and a sequence a„ — > oo such that 1 — F*(a„) = ^ and P(X„:„ < a„) > 1 — e. Hence, on 
this event, F* has all of its mass on [0, a„] and integration with respect to F* may thus 
be restricted to < a; < a„. Furthermore, by Lemmas 3.1 and 3.2, the processes C/Cn 
and Dni + Dn2 remain stochastically bounded when restricted to the support of F* as 
oo. Together with Bn{x) < 0, it therefore suffices to show that 



n 



r- |^(^)|7(^) P^«(dy) 



The expectation of the left-hand side is less than or equal to 



For any fixed Xo with F{xo) < 1, the integral 

\^{xMx) r- F*{Ay) 



F*{Ax) 
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is finite since 1 — F{y) is bounded away from zero there and, by assumption, 

F*{dy) _ 



The same liolds true for the integral 



dF 
~G 



< oo. 



C{x) 



It remains to study 



,.0 Cix) C^{y) 
This integral is bounded from above, however, by 



■F*{dx). 



G-\xo) 



°" \ipix)\F{dx) 
l-^(^) 



Now apply Cauchy-Schwarz to get 
r"" \ipix)\F{dx) 



1-F(a 



< 



(p^{x)F{dx) 



1 1/2 


■ r- F{dx) ' 


1/2 




[Jo [l-F{xW\ 





The second integral is less than or equal to [1 — i^(a„)] ^ . Since 

- = 1 - F*{a„) = a-^ / G{z)F{dz) < a-\l ~ F(a„)), 



the second square root is 0(ri^/^). On the other hand, the first integral can be made 
arbitrarily small by choosing xq large enough. This concludes the proof of Lemma 3.3. □ 

Next, we study Sn2- For this, the following lemma will be crucial. 

Lemma 3.4. Under the assumptions of Theorem 1.1, we have 



■dF,:=op(n-i/2). 



Proof. As in the proof of Lemma 3.3, we have Xi < a„ for all i = 1, . . . , n with large 
probability. Similarly, consider a sequence 6„ > such that 

F*{bn) = — , c sufficiently small, 
n 

such that Xi > b„ for i = 1,2, ... ,n with large probability. In other words, up to an event 
of small probability, we may restrict integration in /„ to the interval [6„,a„]. Also, in 
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view of Lemma 3.1, the C„ in the denominator may be replaced by C (times a constant), 
with large probability. Hence, on a set with large probability, we have 



1=1 



\(p{Xi)\^{Xt)l[t,^<Xi<a„} 
C2(X,)C„(X,) 

IiY.(My.<x.<x,} - C{X,)) + 1(1 - CiX,)) 

|VP(X,)|7(^.)1{...} / 1 



(3.6) 



n ■ 



-y 



C3(X0 
\^{XMX^)li...} 



J2il{...y-C{X,)) 



^ c2(xoc„(xo • 

The first sum has an expectation not exceeding 



Fix a small positive Xi and, as in the previous proof, a large xq. Assume, without loss of 
generality, that bn < xi < xq < On- The integral 



Mr 

(J2 



dF* 



is finite. So, the middle part contributes an error 0(i), which is smaller than desired. 
The upper part, J^^ . . . , is dealt with as in the proof of Lemma 3.3, yielding a bound 
o{n^/^) as xo gets large. The same holds true for the lower part. Finally, the second 
sum in (3.6) may be studied along the same lines as was the first, by starting with the 
inequality nCn{Xi) > 1. The proof is thus complete. □ 

Corollary 3.1. Under the assumptions of Theorem 1.1, we have 



Sn2 



c 



dF: 



^ di^„+op(n / ). 



We shall come back to Sn2 later, but first proceed to Sns- Recalling Dni, we have 

^(^)eA„(.) r- F:idy)-F*idy) 



Sn3 — ^ 



Cn{x) 



C{y) 



-K(dx) 



To get an expansion for 5„3, the next lemma will be crucial. 
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Lemma 3.5. Under the assumptions of Theorem 1.1, we have 



F,:{dy)-F*idy) 



C{y) 



F:{Ax) 



Proof. By Cauchy-Schwarz, on a set with large probability, 



nlll < 



°" I^(^)l7(^) 



F,*idy)-F*idy) 



C{y) 



K(dx) 



Cn[x) 



F*(dy) 



(3.7) 



By Lemma 3.1, we may again replace C„ by C. The expectation of the resulting first 
integral is then less than or equal to 



p |^(.)|7(x) r F*{dy) ^^^^^^ 



C{x) 



CHy) 



which was already shown to be o(l) in the proof of Lemma 3.3. It therefore remains to 
show that the second factor in (3.7) is also op(l). Putting 



z„{x) = A„(a;) + 



"~ F*{dy) 

Civ) ' 



we may write 

[e^"(^)-l]' = z2(a;)e2^~"W, 

where Zn{x) is between zero and Zn{x). Recalling that A„(a;) is between the first integral 
in Bn{x) and — /q^ ^C(yj''* ' that Bn{x) < 0, we may infer that Zn{x) is uniformly 
bounded from above in probability, on the support of i^*, as n— > oo. Hence, it suffices 
to bound the term 

1/2 r- \ip{x)\^{x)zl{x) 

Jo Cix) ^' 

which, in turn, is less than or equal to 



,1/2 



< n 



C{x) 



B,,{x) + Dniix) + D„2{x)]^F:{dx) 



'"^fj ''"c(x/"^ [^»(") + (^"i(-) + DMf]F:{dx). 



By (3.5), 



W{x)\^{x)Bl{x) 



F*{dx)<n- 



°" I'^(^)l7(^) 
C{x) 



Cl{y) 



F*{dx) 
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To bound the right-hand side, first replace C„ with C. The squared term may then be 
viewed as a V-statistic, of which the leading term is a U-statistic. Its expectation is 



n — 1 



F*idy) 



Thus, we have to show that 



\ipix)\ 



F*{dy) 



1 2 



This follows as in the proof of Lemma 3.3. Only the powers of 1 — F{x) are different. 
Finally, the error terms involving Dni and Dn2 may be dealt with similarly. The proof 
is thus complete. □ 

Lemma 3.6. Under the assumptions of Theorem 1.1, we have 
^7pn-q(x) r- F*{Ay)^F*{dy) 



CCn{x) 



C{y) 



F„*(dx)=op(n-V2). 



Proof. As above. 



□ 



Lemmas 3.5 and 3.6 immediately imply the following corollary, which brings us to the 
desired representation of Snz- 

Corollary 3.2. Under the assumptions of Theorem 1.1, we have the representation 



Sn 



C{x) 



F:idy)-F*idy) 

C{y) 



F*(dx)+op(n-i/2). 



We now proceed to Sni- As a first step to get the desired representation, we shall need 
the following lemma. 



Lemma 3.7. Under the assumptions of Theorem 1.1, Sni admits the expansion 
^(^)gA„(.) f- C,,{y) + l/n^C{y) 



Cnix) 



CHy) 



-K(dy)F„*(dx) 



(3.8) 



Proof. Recalling the definition of D„2 , the difference between Sn4 and the leading term 
in (3.8) becomes 



A„(2;) fX 



[Cn{y) + l/n-C{y)f 
Cn{x) Jo C^{y)[Cn{y) + l/n] 



ip{x)e 



F„*(dy)F„*(dx). 
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Taking Lemma 3.1 into account, the last expression is bounded in absolute values from 
above, with large probability, by 

,(x)ieA^(^) r-[^«(y)+i/-^(^^,(d,)^^:(d.). 



C{x) Jo C^{y) 
In the proof of Lemma 3.5, we argued that 

- F*{Ay) 



Zn{x) = A„(x) 



C{y) 



is uniformly bounded from above, on the support of F*, as n — > oo, with probability one. 
Consequently, we may replace exp(A„(a;)) by 7(x). Also, we may wish to extend the 
integral only up to a„ . The expectation of the resulting term then does not exceed 

These terms already appeared in the proof of Lemma 3.3 and were there shown to be 
o(n^^/^). The proof is therefore complete. □ 

In the following, we shall omit the summand ^ in (3.8), since its contribution is also 
negligible. The next lemma will enable us to replace exp(A„(a;)) by 7(x). 

Lemma 3.8. Under the assumptions of Theorem 1.1, we have 

Proof. Cauchy-Schwarz leads to a bound for nlll^ similar to (3.7) for nll^. The second 
factor is exactly the same as before. The first factor is dealt with in the following lemma. □ 

Lemma 3.9. Under the assumptions of Theorem 1.1, we get 

V,:(dx)=op(n-i/2). 



r \^{x)\^{x) 

Jo C{x) 



Proof. As in previous proofs, it is enough to extend the x-integral up to a„. It is then 
easily checked that the expectation of the resulting term is less than or equal to (when 
n>3) 
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where 

y A z = min(y, z) and ?/ V z = max(2/, z). 
Neglecting 4?!"^ for a moment, the rest equals 



{n-3)aJo '"^ "Jo Jy C^iy)C^iz) 



< ^ / " \^{^)\ I w/ p. .^ f(dz)J-(dx). 

The last integral already appeared in the proof of Lemma 3.3 and was shown to be 
o(n^/^). The error term yields similar bomids, so the proof is complete. □ 

In the final step, C„ in the denominator of (3.8) may be replaced by C . The details 
are omitted. Hence, we arrive at the following representation of S'„4. 

Corollary 3.3. Under the assumptions of Theorem 1.1, we have 

+ op(n-i/2). 

We are now ready to prove Theorem 1.1. 

Proof of Theorem 1.1. Corollaries 3.1, 3.2 and 3.3 yield representations of the relevant 
terms <S'n2,'S'„3 and 5„4 as V-statistics. As is known (see Serfling (1980)), a V-statistic 
equals a U-statistic, up to an error op(7i^^/^). If, in S'„2,'S'„3 and SnA, we replace F* 
by F*, we come up with degenerate U-statistics which are all of the order op(ri~^/^). 
Collecting the leading terms and applying Fubini's theorem then yields the proof of 
Theorem 1.1 with F„ instead of F„. 
As for Fn, apply the inequality 



i-l i-1 
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to the weights of F„ and F„. It follows that 



Cn{x) 



Gl{y) 



F:Mv)F:{dx). 



Use Lemma 3.1 again to replace C„ with C and apply the SLLN (for U-statistics) to get 
that the last term is Op{n^^). This completes the proof of Theorem 1.1. □ 



Proof of Lemma 1.1. The proof proceeds by induction on i. For i~l and j = 1, the 
left-hand side of (1.16) equals l/nC,^([/ii:„), as does the right-hand side. The assertion 
then follows from (1.14). For i = 1 and j = 2, the left-hand side of (1.16) equals, by (1.8) 
and (1.15), 

(nC„^(C/n:n)-l)/(nC^(t/n.O) _ 1 _ 1 

For i ~1 and j > 2, the proof follows the same pattern, by repeated use of (1.15). Hence, 
the assertion of Lemma 1.1 holds true for i = 1. Assuming that (1.16) holds true for all 
indices less than or equal to i, we obtain, by (1.7), 

This may be used as a starting point to show (1.16) for i + 1. Again, make repeated use 
of (1.14) and (1.15). □ 
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